cv ar
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > Middle East > Republic of Türkiye (0.04)
- Asia > China (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- North America > United States > New Hampshire (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
- Information Technology (0.87)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)
A Approximate Sampling from k-DPP Marginals The final piece of the A
In view of this, Barthelmé et al. (2019) propose an approximation to k-DPPs valid for large-scale ground sets which has better numerical properties. L( h): H [0, 1] be a random variable. The first equality uses Proposition 4. The second equality uses Proposition 3 and the fact that the We decompose bound the game regret into the sum of player and sampler regret. D, then a learner player that plays SGD algorithm suffers at most regret O ( GD T) . For convex regression and classification models we use linear models.
- Europe > Switzerland > Zürich > Zürich (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > Ontario > Toronto (0.04)
RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization
Fukazawa, Kai, Mundada, Kunal, Soltani, Iman
In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) offers an attractive alternative but only if policies deliver high returns without incurring catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of value or model-based pessimism, and restricted policy classes that limit policy expressiveness, whereas diffusion/flow-based expressive generative policies trained with a behavioral-cloning (BC) objective have been used only in risk-neutral settings. Here, we address this gap by introducing the \textbf{Risk-Aware Multimodal Actor-Critic (RAMAC)}, which couples an expressive generative actor with a distributional critic and, to our knowledge, is the first model-free approach that learns \emph{risk-aware expressive generative policies}. RAMAC differentiates a composite objective that adds a Conditional Value-at-Risk (CVaR) term to a BC loss, achieving risk-sensitive learning in complex multimodal scenarios. Since out-of-distribution (OOD) actions are a major driver of catastrophic failures in offline RL, we further analyze OOD behavior under prior-anchored perturbation schemes from recent BC-regularized risk-averse offline RL. This clarifies why a behavior-regularized objective that directly constrains the expressive generative policy to the dataset support provides an effective, risk-agnostic mechanism for suppressing OOD actions in modern expressive policies. We instantiate RAMAC with a diffusion-based actor, using it both to illustrate the analysis in a 2-D risky bandit and to deploy OOD-action detectors on Stochastic-D4RL benchmarks, empirically validating our insights. Across these tasks, we observe consistent gains in $\mathrm{CVaR}_{0.1}$ while maintaining strong returns. Our implementation is available at GitHub: https://github.com/KaiFukazawa/RAMAC.git
- North America > United States > California > Yolo County > Davis (0.04)
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
RRaPINNs: Residual Risk-Aware Physics Informed Neural Networks
Akazan, Ange-Clément, Karambal, Issa, Ngnotchouye, Jean Medard, W, Abebe Geletu Selassie.
Physics-informed neural networks (PINNs) typically minimize average residuals, which can conceal large, localized errors. We propose Residual Risk-Aware Physics-Informed Neural Networks PINNs (RRaPINNs), a single-network framework that optimizes tail-focused objectives using Conditional Value-at-Risk (CVaR), we also introduced a Mean-Excess (ME) surrogate penalty to directly control worst-case PDE residuals. This casts PINN training as risk-sensitive optimization and links it to chance-constrained formulations. The method is effective and simple to implement. Across several partial differential equations (PDEs) such as Burgers, Heat, Korteweg-de-Vries, and Poisson (including a Poisson interface problem with a source jump at x=0.5) equations, RRaPINNs reduce tail residuals while maintaining or improving mean errors compared to vanilla PINNs, Residual-Based Attention and its variant using convolution weighting; the ME surrogate yields smoother optimization than a direct CVaR hinge. The chance constraint reliability level $α$ acts as a transparent knob trading bulk accuracy (lower $α$ ) for stricter tail control (higher $α$ ). We discuss the framework limitations, including memoryless sampling, global-only tail budgeting, and residual-centric risk, and outline remedies via persistent hard-point replay, local risk budgets, and multi-objective risk over BC/IC terms. RRaPINNs offer a practical path to reliability-aware scientific ML for both smooth and discontinuous PDEs.
- Europe > Portugal > Braga > Braga (0.04)
- North America > United States (0.04)
- Africa > Rwanda (0.04)
Policy Gradient for Coherent Risk Measures
Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor
Several authors have recently developed risk-sensitive policy gradient methods that augment the standard expected cost minimization problem with a measure of variability in cost. These studies have focused on specific risk-measures, such as the variance or conditional value at risk (CV aR). In this work, we extend the policy gradient method to the whole class of coherent risk measures, which is widely accepted in finance and operations research, among other fields. We consider both static and time-consistent dynamic risk measures. For static risk measures, our approach is in the spirit of policy gradient algorithms and combines a standard sampling approach with convex programming. For dynamic risk measures, our approach is actor-critic style and involves explicit approximation of value function. Most importantly, our contribution presents a unified approach to risk-sensitive reinforcement learning that generalizes and extends previous results.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)